In this assignment, we are going to re-draw a plot we produced in the course “Reproducible Research” of this Data Science Specialization.
The underlying data are the number of steps taken by time of day of a group of people. We load and preprocess it just as we did then.
library(tidyverse)
library(lubridate)
activity = read_csv("data/activity.zip") %>%
group_by(interval) %>%
mutate(steps = ifelse(is.na(steps), mean(steps, na.rm = T), steps)) %>%
ungroup() %>%
mutate(weekday = wday(date, week_start = 1)) %>%
mutate(weekdayOrWeekend = ifelse(weekday <= 5, "weekday", "weekend")) %>%
mutate(weekdayOrWeekend = as.factor(weekdayOrWeekend))
head(activity)
## # A tibble: 6 x 5
## steps date interval weekday weekdayOrWeekend
## <dbl> <date> <dbl> <dbl> <fct>
## 1 1.72 2012-10-01 0 1 weekday
## 2 0.340 2012-10-01 5 1 weekday
## 3 0.132 2012-10-01 10 1 weekday
## 4 0.151 2012-10-01 15 1 weekday
## 5 0.0755 2012-10-01 20 1 weekday
## 6 2.09 2012-10-01 25 1 weekday
Just like before, we want to compare step activity during weekdays with the activity during weekends.
The original plot looked as follows
library(ggplot2, warn.conflicts = F)
ggplot <- activity %>%
group_by(interval, weekdayOrWeekend) %>%
summarise(steps = mean(steps)) %>%
ggplot(aes(x = interval, y = steps)) +
geom_line() +
facet_grid(rows = vars(weekdayOrWeekend)) +
ylab("Number of steps") +
xlab("Interval")
## `summarise()` regrouping output by 'interval' (override with `.groups` argument)
ggplot
library(plotly)
plotly <- ggplotly(ggplot)
plotly